Towards the Adequate Evaluation of Morphosyntactic Taggers
نویسندگان
چکیده
There exists a well-established and almost unanimously adopted measure of tagger performance, namely, accuracy. Although it is perfectly adequate for small tagsets and typical approaches to disambiguation, we show that it is deficient when applied to rich morphological tagsets and propose various extensions designed to better correlate with the real usefulness of the tagger.
منابع مشابه
Taggers Gonna Tag: An Argument against Evaluating Disambiguation Capacities of Morphosyntactic Taggers
Usually tagging of inflectional languages is performed in two stages: morphological analysis and morphosyntactic disambiguation. A number of papers have been published where the evaluation is limited to the second part, without asking the question of what a tagger is supposed to do. In this article we highlight this important question and discuss possible answers. We also argue that a fair eval...
متن کاملImproving Morphosyntactic Tagging of Slovene Language through Meta-tagging
Part-of-speech (PoS) or, better, morphosyntactic tagging is the process of assigning morphosyntactic categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging of Slovene texts is a challenging task since the size of the tagset is over one thousand tags (as opposed to English, where the size is typically around sixty) and the sta...
متن کاملTowards robust multi-tool tagging: An ontology-based approach
In the realm of morphosyntactic annotations, ensemble combination techniques have been successfully applied to obtain more robust and more reliable linguistic analyses. Ensemble combination architectures employ multiple classifiers, e.g., part of speech taggers trained on different corpora, and combine their output, e.g., by means of a majority vote, phenomenondependent selection preferences or...
متن کاملLinguistic variations and morphosyntactic annotation of Latin classical texts
This paper assesses the performance of three taggers (MBT, TnT and TreeTagger) when used for the morphosyntactic annotation of classical Latin texts. With this aim in view, we selected the training corpora, -as well as the samples used for tests-, from the texts of the LASLA database. The texts were chosen according to their ability to allow testing of the taggers sensitivity to stylistic, diac...
متن کاملMulti-source morphosyntactic tagging for spoken Rusyn
This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolki...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010